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ABSTRACT 

This paper describes the prototype expert 
systems that diagnose the Distribution and 
Switching System I and II (DSS1 and DSS2), 
Statistical Multiplexers (SM), and Multiplexer 
and Demultiplexer systems (MDM) at the NASA 
Ground Terminal (NGT). A system level fault 
isolation expert system monitors the activities 
of a selected data stream, verifies that the fault 
exists in the NGT and identifies the faulty 
equipment. Equipment level fault isolation 
expert systems will be invoked to isolate the 
fault to a Line Replaceable Unit (LRU) level. 

Input and sometimes output data stream 
activities for the equipment are available. The 
system level fault isolation expert system will 
compare the equipment input and output 
status for a data stream and perform loopback 
tests (if necessary) to isolate the faulty 
equipment. The equipment level fault isolation 
system utilizes the process of elimination and/or 
the maintenance personnel's fault isolation 
experience stored in its knowledge base. The 
DSS1, DSS2 and SM fault isolation systems, 
using the knowledge of the current equipment 
configuration and the equipment circuitry, will 
issue a set of test connections according to the 
predefined rules. The faulty component or 
board can be identified by the expert system by 
analyzing the test results. The MDM fault 
isolation system correlates the failure symptoms 
with the faulty component based on 
maintenance personnel experience. The faulty 
component can be determined by knowing the 
failure symptoms. 

The NGT fault isolation prototype is 
implemented in Prolog, C and VP-Expert, on an 
IBM AT compatible workstation. The DSS1, 
DSS2, SM, and MDM equipment simulators are 


implemented in PASCAL. The equipment 
simulator receives connection commands and 
responds with status for the expert system 
according to the assigned faulty component in 
the equipment. The DSS1 fault isolation expert 
system was converted to C language from VP- 
Expert and integrated into the NGT automation 
software for offline switch diagnoses. 

Potentially, the NGT fault isolation algorithms 
can be used for the DSS1, SM, and MDM located 
at Goddard Space Flight Center (GSFC). The 
prototype could be a training tool for the NGT 
and NASA Communications (Nascom) Network 
maintenance personnel. 

1.0 INTRODUCTION 

This section will describe the background, 
problem, objective, and scope of this paper. 

1.1 Background 

The NGT is located with the ground terminal 
portion of the Tracking and Data Relay Satellite 
System (TDRSS) at White Sands, New Mexico 
The primary role of the NGT is to serve as the 
interface for communication between the 
TDRSS and NASA facilities at Goddard Space 
Flight Center (GSFC) and Johnson Space Center 
(JSC). The primary functions of the NGT are 
data transport, data quality monitoring, and 
line outage recording and data rate buffering. 

In order to meet the future (early 1990's) 
workload of multiple Tracking and Data Relay 
Satellite (TDRS) support and user support 
requirements, an NGT Automation (NGTA) 
project was completed at the end of 1989 
(GSFC, STDN No. 528, 1986). The NGTA provides 
the capabilities to automatically configure the 
major NGT subsystems and to monitor their 
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health and status using high-speed scheduling 
messages from the Network Control Center 
(NCC). The major communication equipment 
includes the Distribution and Switching System I 
and II (DSS1 and DSS2), Statistical Mulitplexers 
(SM) and Multiplexer and Demulitplexer 
systems (MDM) as depicted in Figure 1. They 
are controlled and monitored by the DSS1 
Interface Processor (IP), SM and DSS2 Interface 
Processor (SMD IP), MDM Automatic Control 
System (MACS) and NGT Control and Status 
System (NCSS) as depicted in Figure 2. 

The NGT communication equipment trouble- 
shooting is done manually, although much of 
the equipment health status is monitored auto- 
matically. The manual troubleshooting 
requires a skilled technician and is time 
consuming. An automated fault isolation 
system will significantly reduce the equipment 
down time and the required technician skill 
level. The rule based expert system technique 
has been used to assist fault isolation tasks for 
various GSFC supported projects (Erikson and 
Hooker, 1989; Luczak, et al., 1989; Lowe, et al., 
1987). 

1.2 Problem 

The communication equipment at NGT was 
built in the late 1970‘s or early 1980's. This 
equipment does not allow the internal signal 
status to be monitored remotely. For example, 
the DSS1 sends to the DSS1 IP the data and clock 
present status at the input and output ports 
and also sends the result of the comparison of 
the input and output port signals. Only this 
status information received by the IP is 
available for the automated fault isolation 
process. There is no status available between 
the input and output ports The internal signal 
monitoring capabilities are also limited for the 
DSS2, MDM, and SM. With limited status 
information, it is not possible to directly 
identify a faulty LRU (usually a circuit board) 
within the equipment 

NASA is building the Second TDRS Ground 
Terminal which will replace the NGT functions 
during the middle 1990‘s. It is not cost effective 
to enhance the NGT communication hardware 
to provide more internal monitoring 
capabilities for the fault isolation purpose. 
Therefore, the proposed automated fault 


isolation system shall use only the existing 
computer hardware capability and shall not 
impact the performance of the other functions. 

With these limitations, it is a challenge to 
develop a low cost automated fault isolation 
system for the NGT in a timely manner. The 
specific objectives of the NGT automated fault 
isolation prototype follow. 

1.3 Objectives 

The major objective of this study is to develop 
an automated fault isolation system prototype 
using a rule based expert system to prove the 
feasibility of building a low cost fault isolation 
system that meets all the restrictions as 
previously described. The secondary goal is to 
eventually convert the prototype to an 
operational system. The prototype can also be 
used for the following: 

• to explain the fault isolation approach 
and methodology as a training tool 

• to verify the methodology during its 
development 

• to identify the operator interface 
requirements. 

1.4 Scope 

The fault isolation prototype was developed for 
the DSS1, DSS2, MDM, and SM. The interface 
processors for the equipment were not 
included. 

2 0 APPROACH 

The NGT automated fault isolation concept 
takes the top down approach. The system level 
fault isolation expert system will first verify that 
the fault indeed occurred in the NGT and will 
identify the faulty equipment. It will then 
invoke an equipment level fault isolation 
expert system to identify the faulty component 
at the LRU level. There is an equipment level 
fault isolation expert system for each piece of 
equipment. 

The system level fault isolation expert system 
will compare the equipment input and output 
status for a data stream and perform loopback 
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tests (if necessary) to isolate the faulty 
equipment. The equipment level fault isolation 
system utilizes the process of elimination and/or 
the maintenance personnel fault isolation 
experience stored in its knowledge base. The 
DSS1, DSS2 and SM fault isolation systems, 
using the knowledge of the current equipment 
configuration and the equipment circuitry, will 
issue a set of test connections and analyze test 
results according to the predefined rules. 

For instance, a test data path can be chosen so 
that only one component on the faulty data 
stream is not shared by both paths (it is 
substituted by another component which is 
known to be good). If the data path is good, it 
can be concluded that the component not used 
for the test is bad. Otherwise, another 
component in the faulty data stream is bad and 
more tests are required. Using this substitution 
and elimination method for all components in 
the faulty data stream, the faulty component or 
board can be identified. The MDM fault 
isolation system correlates the failure symptoms 
with the faulty component based on the 
maintenance personnel experience. The faulty 
component can be determined by knowing the 
failure symptoms. 

The system level fault isolation software can 
reside in the NCSS computer since it has the 
activity status of all equipment in the data 
stream. The equipment level fault isolation 
system can be distributed to the corresponding 
IP which has the capability to monitor the data 
stream activity status (i.e. clock or data present), 
and to issue test commands for additional 
information. The rules to compare the 
equipment status and to determine test cases 
are simple. The additional code allocated to 
each IP will not impact the NGTA performance. 
This approach provides an efficient method to 
achieve the NGT automated fault isolation 
goals under the restrictions previously 
described. The detailed algorithms to identify 
the faulty equipment and component have 
been prototyped and the results are presented 
in the following paragraphs. 

3.0 FAULT ISOLATION ALGORITHMS 

Both the system level and equipment level fault 
isolation system prototypes were developed. 
For each equipment level fault isolation system, 


an equipment simulator was built to receive 
test commands and to respond with status 
messages according to the assigned failure. 

3.1 System Level Fault Isolation 

The NGTA data base in the NCSS stores the data 
stream service configuration and scheduling 
information which is available to the expert 
system. From this, the expert system can find all 
the equipment used to support the data 
stream. The expert system will use the 
information for fault isolation from any 
equipment in the data stream that may cause 
an alarm or complaint. This system level NGT 
fault isolation can be initiated by the operator. 

The system level fault isolation expert system 
will isolate the faulty equipment in the NGT if 
the data stream status is good at the equipment 
input port and is bad at the output port. The 
system will conclude that there is no fault in the 
NGT if all the status along a data stream in the 
NGT are bad and if the service is during the 
spacecraft acquisition, reacquisition, ground 
equipment reconfiguration, service-to-service 
handovers or first several minutes of service. 
During these periods, the data stream is not 
stable (Miksell, et a!., 1987). 

The system will configure the NGT ground 
equipment to perform loopback tests if the 
input data stream status is good and the output 
status is not available. Data stream signals at 
the MDM Output Controller (OC) output port, 
Nascom's Domestic Satellite (DOMSAT) 
downlink, or SM transmitter output are 
available for loopback tests through switching. 
If the loopback test shows the data stream is 
good, the fault is not in the NGT. 

The system level fault isolation system will 
invoke the equipment level fault isolation 
system to identify the fault to the LRU level. 
The following paragraphs describe the 
equipment level fault isolation principles. 

3.2 Distribution and Switching System I 

The DSS1 provides buffering and switching for 
192 digital low-data rate signals that pass 
through the NGT. The design of the DSS1 is 
based on a Clos-type nonblocking switch array 
that consists of stage A, B, and C cards. One test 
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card and 8 input cards are also used to monitor 
switch status and to generate test data. 

The Input, A, B, and C cards consist of decoders 
and selectors (4 to 1 selection). The health of a 
card is determined by testing the health of the 
decoder and selector used for a connection. 
The status of a selector can be determined by 
making a new connection that uses a different 
selector but otherwise the same components in 
the A, B, and C cards as the faulty connection. 
This can be done since a decoder controls a 
group of selectors. If the new connection is 
good, the selector on the faulty connection 
path is bad. Otherwise, another component on 
the path is bad and the decoder needs to be 
tested next. The new connection shall consist 
of all new components except the decoder. If 
the new connection is bad, the decoder is bad. 
Otherwise, the decoder and selector of another 
card on the connection path shall be tested. 
Through the process of substitution and 
elimination the faulty component and the 
faulty card can be determined 

The expert system, with the knowledge of 
switch circuitry and the current connections, 
will find the proper free ports to support tests, 
issue commands to conduct the tests, and 
interpret the test results. 

3.3 Distribution and Switching System II 

The DSS2 consists of 48 data/clock pair inputs 
and 40 data/clock outputs. The major line 
replaceable modules of the DSS2 are as follows: 
input module, switch module, multiplexer 
module, output module, peripheral electronics 
assembly, and computer assembly. 

The clock and data status are monitored at the 
input module and the output module. An 
example of an algorithm to isolate the faulty 
module if the data status is good at the input 
but bad at the output follows. 

A connection test is made by connecting the 
same input to a proper output such that the 
faulty connection and test connection share the 
same switch module and the same switch Large 
Scale Integration (LSI) circuitry in the switch 
module. If the new connection is good, the 
faulty module is either the multiplexer module 


or the output module. Otherwise, the faulty 
module is either the input module or the switch 
module. 

To distinguish whether the fault is the input 
module or the switch module, another test is 
required. The new test will use the same input 
module and switch module as the faulty 
connection, but will select an output that uses a 
different switch LSI in the same switch module. 
If the new connection is good, the switch LSI is 
faulty, otherwise the fault is either the input 
module or the fan-out board in the switch 
module. More tests are needed to complete 
the fault isolation process. 

The DSS2 fault isolation expert system has the 
knowledge base and rules to perform the 
process of elimination as previously described 
and will isolate the faulty module or modules. 

3.4 Multiplexer and Demultiplexer 

This section describes the fault isolation 
algorithms for the multiplexer and 
demultiplexer portions of the MDM separately. 

3.4.1 Multiplexer 

The Multiplexer systems at the NGT consist of 
100 Input Terminal Units (ITU) and Triple 
redundant Output Controllers (OC). The 
Multiplexer system is fully redundant. Both 
prime and alternative systems process and 
transmit the composit data streams 
simultaneously. The fault isolation processes 
includes loopback tests and direct 
interpretation of the failure symptoms. 

Only the input clock status at the ITU for a data 
stream in the Multiplexer system is monitored. 
Loopback tests are required to determine that 
the fault is indeed in the Multiplexer as descried 
3.1. If the fault is determined to be in the 
Multiplexer, and GSFC or JSC experience 
problems on all the data streams from the NGT, 
the fault is in the OC. If only one data stream 
has problems, the fault is in the corresponding 
ITU. There are three boards in an ITU and 7 
boards in an OC. Further fault isolation to the 
board level requires the knowledge that 
associates the failure symptoms and/or alarm 
messages to the most likely faulty board. 
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The expert system, with the knowledge of the 
multiplexer configurations from the service 
schedules and the failure symptom correlations 
from maintenance personnel experiences, will 
configure the loopback tests and interact with 
the operator to isolate the faulty board. 

3.4.2 Demultiplexer 

A demultiplexer consists of one Input Controller 
(1C) and 30 Output Terminal Units (OTU). There 
are two Demultiplexer systems at the NGT to 
process the signals from GSFC and JSC 
separately. The third one is used as a spare 
and/or to support a recorder playback function. 

The Demultiplexer fault isolation is based on 
the error messages the demultiplexer generates 
and the failure symptoms the operator 
observes. Instead of loopback tests, the spare 
demultiplexer is used to support fault isolation. 
The composite data stream from GSFC or JSC 
will be routed to the spare unit and 
demultiplexed, and the output data stream 
status will be monitored If the data stream is 
good then the fault is in the NGT. There are 
three logic cards in an OTU and four logic cards 
in an 1C. If only one data stream has problems, 
the fault is in the associated OTU. If all the data 
streams have problems, the fault is in the 1C. 
Based on the failure symptoms the most likely 
faulty logic card can be isolated. 

3.5 Statistical Multiplexer 

The SM at the NGT consists of a transmit section 
and a receive section. Four input ports are 
available for the transmit section and four 
output ports are available for the receive 
section. A spare SM at the NGT is available for 
backup and fault isolation support. 

There is a receive module and a transmit 
multiplex module within the transmit section. 
There are demultiplex modules, a patten 
detector module, frequency synthesizers, and 
error drivers within the receive section. The 
transmit/receive module and high speed data 
driver boards are shared by both sections. 
During normal operations, the composite data 
stream output from the transmit section is 
looped back to the receive section of the same 
unit, and the composite data stream from the 


SAT is looped back to the receive section of the 
spare unit for data status monitoring. 

When the receive sections of both units indicate 
data or clock loss, the fault is in the transmit 
section. The service will be restored by 
switching the input data streams to the spare 
unit. The faulty module will be determined 
after the service is over and during equipment 
free time. 

For instance, a transmit section fault can be 
determined by feeding the test data to all four 
input ports and looping the composite data to 
the receive section of the demutiplexer. If all 
ports at the receive section lose clock and data, 
the fault is in the transmit/receive module. If 
only one port at the receive section loses clock 
and data, the fault is in the corresponding 
transmit multiplex module. By examining the 
failure impacts the faulty module can be 
identified. 

The expert system will interact with the 
maintenance personnel to setup the test data 
generator and to configure the loop back tests. 
The expert system will monitor the data stream 
status and determine the faulty module. 

4.0 PROGRESS AND PROTOTYPE USAGE 

The system level fault isolation expert system 
was implemented in dBASE III to take 
advantage of a relational file structure for 
storing the service schedules and configuration 
information, and for the ease in data input. 
The DSS1 fault isolation expert system was 
implemented in VP-Expert, a rule-based expert 
system development tool (Sawyer, et al., 1987). 
The DSS2, MDM, and SM fault isolation expert 
systems were written in Turbo Prolog. The 
simulators for the equipment were written in 
Turbo Pascal. All software was implemented on 
an IBM AT compatible workstation. The system 
level and the DSS1 equipment level fault 
isolation programs were converted to C 
language for possible integration with the 
operational NGTA software. 

The user can test the prototype by assigning 
one communication equipment fault at the 
board or module level, by configuring a service 
identifying the equipment involved and ports 
used, and by starting data transmission. The 
prototype will issue the appropriate error 
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messages to indicate a system failure, then the 
user can initiate the fault isolation procedure. 
The system level fault isolation system will 
display the service configuration graphically 
along with all the available color coded clock 
and data status. The faulty equipment will be 
highlighted in red. 

The equipment level fault isolation system will 
be invoked by the system level fault isolation 
system. The system will display a flow diagram 
to show all the modules in the equipment 
needed to support the faulty service along with 
the available status. The prototype will also 
show the test data path and status on the same 
diagram but in a different color. The 
components shared by both paths can be seen 
clearly. The component determined to be 
healthy after a test will turn green. The user is 
able to observe the process of elimination step 
by step until the faulty board or module is 
isolated. The test connections and results are 
also explained in the message window. The 
fault isolation result can be compared to the 
fault initially assigned to verify the success of 
the fault isolation system. 

A series of tests were performed to debug the 
software and to verify the fault isolation 
algorithms. The fault isolation prototype 
successfully demonstrates the automated fault 
isolation capabilities. The DSS1 fault isolation 
algorithm was actually tested at the NGT. Chips 
in the A card, Input card and Test card were 
purposely damaged to create faults. A set of 
test connections were issued manually 
according to the fault isolation algorithm. By 
analyzing the connection test results, the faulty 
boards were correctly identified. 

5.0 CONCLUSIONS AND RECOMMENDATIONS 

The NGT fault isolation prototype does 
demonstrate the feasibility to apply expert 
system techniques to perform the automated 
fault isolation tasks with minimum operator 
intervention. The conclusions of the prototype 
implementation are as follows: 

• The prototype proves that it is feasible 
to develop automated fault isolation 
algorithms for the NGT communication 
equipment. 

• The fault isolation algorithms are 
simple and effective. There is no 


hardware enhancement required to 
implement the algorithms. 

• The fault isolation system software can 
be distributed and integrated into the 
NGTA subsystems to perform automatic 
test configuration, status monitoring 
and interpretation, and fault isolation 
with minimal impacts to the system 
response time. 

• The prototype can be used as a training 
tool to explain the fault isolation 
algorithms. 

The prototype demonstrates the feasibility and 
cost-effectiveness to add the automated fault 
isolation capabilities to the NGT. It is 
recommended that the fault isolation 
capabilities be implemented at the NGT and 
other NASA facilities with similar equipment 
such as Nascom and Second TDRS Ground 
Terminal. 
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